Background:

Recent studies have identified numerous non-coding loci associated with Alzheimer’s disease (AD) risk, yet their underlying mechanisms and the transcriptional regulatory circuitry specific to AD remain poorly elucidated. In this investigation, we conducted a comprehensive analysis of the epigenomic and transcriptomic landscapes in 850,000 nuclei derived from AD and healthy prefrontal cortex tissues of 92 individuals. Our aim was to construct a detailed map of the brain regulome, encompassing epigenomic profiles, transcriptional regulators, co-accessibility modules, and peak-to-gene links in a cell-type-specific context. To achieve this, we devised novel methods for multimodal integration and the identification of regulatory modules through peak-to-gene linking. Our findings reveal an enrichment of AD risk loci in microglial enhancers and implicate specific transcription factors, such as SPI1, ELF2, and RUNX1. Additionally, we identified 9,628 cell-type-specific ATAC-QTL loci, which, when integrated with peak-to-gene links, allowed us to prioritize regulatory circuits associated with AD variants. Notably, we observed differential accessibility in regulatory modules, with glia showing changes in late-stage AD and neurons exhibiting alterations in early-stage AD. Strikingly, late-stage AD brains displayed a global dysregulation of the epigenome, indicative of epigenomic erosion and loss of cell identity.

Dataset

Load the downsampled RNA-seq-seq dataset, which could be found in the data directory. The dataset consists 0.05% of each cell type proportion out of 414k scRNA-seq dataset which could be found here. The paper is Xiong et al. (2023) . This dataset is originally downsampled per cell type proportion (0.05%), and the original file can be found on the link RNA.h5ad

Load the required libraries for the quality control steps of the scRNA-seq dataset

suppressPackageStartupMessages({
    library("zellkonverter")
    library("ggplot2")
    library("reticulate")
    library("SingleCellExperiment")
    library("AnnotationDbi")
    library("org.Hs.eg.db")
    library("EnsDb.Hsapiens.v86")
    library("scater")
    library("Matrix")
    library("Seurat")
    library("DoubletFinder")
    library("dplyr")
})

Load the dataset using readRDS function. The rds file can be found on the dropbox link above.

scRNA_brain_seurat_QC <- readRDS("../data/scRNA_brain_seurat_downsampled.rds")
DefaultAssay(scRNA_brain_seurat_QC) <- "RNA"

Quality Control

The quality control involves several steps to ensure the quality/reliability of the data.

Initially, we can generate visualisations across selected metadata features whether there are outliers, and assess correlations between different features, subsetting and cleaning the data based on specific criteria, calculating cell cycle scores, identifying variable features, performing data scaling, running PCA for dimensionality reduction, and checking metadata for potential doublets.

Visualisations of selected metadata features.

VlnPlot(scRNA_brain_seurat_QC, features = c("n_genes", "n_counts", "pct_mito", "pct_ribo"), ncol = 4,
    pt.size = 0.1) + theme(plot.title = element_text(size = 10))

Let’s plot some of the metadata features against each other and see how they correlate. The number above each plot is a Pearson correlation coefficient.

FeatureScatter(scRNA_brain_seurat_QC, feature1 = "n_counts", feature2 = "pct_mito")

FeatureScatter(scRNA_brain_seurat_QC, feature1 = "n_counts", feature2 = "n_genes")

FeatureScatter(scRNA_brain_seurat_QC, feature1 = "n_counts", feature2 = "pct_ribo")

FeatureScatter(scRNA_brain_seurat_QC, feature1 = "pct_ribo", feature2 = "pct_mito")

##subset scRNA_brain_seurat
subset(
  scRNA_brain_seurat_QC,
  n_genes>750 & 
    n_genes < 8000 & 
    pct_mito < 10 
  #percent.Largest.Gene < 10
) -> scRNA_brain_seurat_QC

scRNA_brain_seurat_QC
## An object of class Seurat 
## 33538 features across 17883 samples within 1 assay 
## Active assay: RNA (33538 features, 0 variable features)
##  1 layer present: counts
cc.genes.updated.2019$s.genes <- UpdateSymbolList(symbols = cc.genes.updated.2019$s.genes)
cc.genes.updated.2019$g2m.genes <- UpdateSymbolList(symbols = cc.genes.updated.2019$g2m.genes)

s.genes <- cc.genes.updated.2019$s.genes
g2m.genes <- cc.genes.updated.2019$g2m.genes
cc.genes <- cc.genes.updated.2019
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
NormalizeData(scRNA_brain_seurat_QC, normalization.method = "LogNormalize", scale.factor = 10000) ->
    scRNA_brain_seurat_QC
scRNA_brain_seurat_QC <- CellCycleScoring(scRNA_brain_seurat_QC, s.features = s.genes, g2m.features = g2m.genes,
    set.ident = TRUE)

VlnPlot(scRNA_brain_seurat_QC, features = c("S.Score", "G2M.Score"), ncol = 4, pt.size = 0.1)

scRNA_brain_seurat_QC <- FindVariableFeatures(scRNA_brain_seurat_QC, selection.method = "vst", nfeatures = 2000,
    verbose = F)
top10 <- head(VariableFeatures(scRNA_brain_seurat_QC), 10)

plot1 <- VariableFeaturePlot(scRNA_brain_seurat_QC)
LabelPoints(plot = plot1, points = top10, repel = TRUE, xnudge = 0, ynudge = 0)

LabelPoints(plot = plot1, points = top10, repel = TRUE, xnudge = 0, ynudge = 0)

Regress out the read depth, mitochondrial percentage and cell cycling genes

scRNA_brain_seurat_QC = ScaleData(scRNA_brain_seurat_QC, vars.to.regress = c("n_genes", "pct_mito",
    "S.Score", "G2M.Score"), verbose = F)

PCA

PCA works by transforming the original gene expression data into a new set of orthogonal variables, called principal components (PCs). These PCs are linear combinations of the original genes and are ordered such that the first PC captures the maximum amount of variance in the data, followed by the second PC capturing the maximum remaining variance orthogonal to the first PC, and so on.

scRNA_brain_seurat_QC <- RunPCA(scRNA_brain_seurat_QC, features = VariableFeatures(object = scRNA_brain_seurat_QC))

Elbow plot is used to determine the optimal number of components or clusters to use in the analysis.

ElbowPlot(scRNA_brain_seurat_QC, ndims = 50)

scRNA_brain_seurat_QC <- RunUMAP(scRNA_brain_seurat_QC, dims = 1:15, verbose = F)
## checking metadata for doublets
head(scRNA_brain_seurat_QC@meta.data)
batch n_counts pct_ribo pct_mito n_genes dataset major.celltype cell_type_high_resolution U1 U2 age_death msex pmi Pathology subject nCount_RNA nFeature_RNA S.Score G2M.Score Phase old.ident
SM_IXFTR_TCCTCGAAGAAGAACG-23 SM_IXFTR 10108 0.0022754 0.0239414 3809 SM_IXFTR-23 Inh Inh L1 PAX6 CA4 3.4924433 15.911015 90+ 0 5 lateAD ROSMAP-59721 11138 4148 0.0496705 -0.0093462 S SM
SM_IXFTR_GCATTAGAGCTTCTAG-32 SM_IXFTR 13255 0.0044512 0.0103357 4238 SM_IXFTR-32 Oli Oli RASGRF1 8.8577060 7.718446 80-85 1 14 lateAD ROSMAP-10859 14688 4732 0.0567775 0.0451328 S SM
SM_IXFTU_GGGTGAAAGGCCACTC-8 SM_IXFTU 5719 0.0034971 0.0183599 2683 SM_IXFTU-8 Exc Exc L2-3 CBLN2 LINC02306 -0.3858942 7.867011 85-90 1 6 nonAD ROSMAP-44788 6375 2930 0.0274323 -0.0357194 S SM
SM_IXFTR_AACGAAAGTCATGACT-23 SM_IXFTR 5868 0.0039196 0.0178937 2490 SM_IXFTR-23 Oli Oli OPALIN 10.7504680 9.076124 90+ 0 5 lateAD ROSMAP-59721 6524 2758 0.0282730 0.0574590 G2M SM
SM_IXFTR_GCCTGTTTCGGACAAG-6 SM_IXFTR 3373 0.0017788 0.0127483 1504 SM_IXFTR-6 Oli Oli OPALIN 6.5767593 10.305967 85-90 1 4 nonAD ROSMAP-22128 3653 1608 0.0242130 0.0577818 G2M SM
SM_190312Tsa_GACGTGCGTAAGGGAA-45 SM_190312Tsa 4510 0.0066519 0.0004435 2498 SM_190312Tsa-45 Exc Exc L3-4 RORB CUX2 -1.2580710 3.195074 80-85 0 10 earlyAD ROSMAP-20005 5358 2820 -0.0068418 0.0555032 G2M SM

Removing doublets

nExp <- round(ncol(scRNA_brain_seurat_QC) * 0.04)  # expect 4% doublets
scRNA_brain_seurat_QC <- doubletFinder(scRNA_brain_seurat_QC, pN = 0.25, pK = 0.09, nExp = nExp,
    PCs = 1:15)
## [1] "Creating 5961 artificial doublets..."
## [1] "Creating Seurat object..."
## [1] "Normalizing Seurat object..."
## [1] "Finding variable genes..."
## [1] "Scaling data..."
## [1] "Running PCA..."
## [1] "Calculating PC distance matrix..."
## [1] "Computing pANN..."
## [1] "Classifying doublets.."
# name of the DF prediction can change, so extract the correct column name.

DF.name = colnames(scRNA_brain_seurat_QC@meta.data)[grepl("DF.classification", colnames(scRNA_brain_seurat_QC@meta.data))]
DimPlot(scRNA_brain_seurat_QC) + NoAxes()

DimPlot(scRNA_brain_seurat_QC, group.by = DF.name) + NoAxes()

VlnPlot(scRNA_brain_seurat_QC, features = "n_genes", group.by = DF.name, pt.size = 0.1)

scRNA_brain_seurat_QC = scRNA_brain_seurat_QC[, scRNA_brain_seurat_QC@meta.data[, DF.name] == "Singlet"]
dim(scRNA_brain_seurat_QC)
## [1] 33538 17168
## save data
saveRDS(scRNA_brain_seurat_QC, file = "../data/scRNA_brain_seurat_QC_fix.rds")
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS 14.2.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] KernSmooth_2.23-22          fields_15.2                
##  [3] viridisLite_0.4.2           spam_2.10-0                
##  [5] dplyr_1.1.4                 DoubletFinder_2.0.4        
##  [7] Seurat_5.0.1                SeuratObject_5.0.1         
##  [9] sp_2.1-3                    Matrix_1.6-5               
## [11] scater_1.26.1               scuttle_1.8.4              
## [13] EnsDb.Hsapiens.v86_2.99.0   ensembldb_2.22.0           
## [15] AnnotationFilter_1.22.0     GenomicFeatures_1.50.4     
## [17] org.Hs.eg.db_3.16.0         AnnotationDbi_1.60.2       
## [19] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0
## [21] Biobase_2.58.0              GenomicRanges_1.50.2       
## [23] GenomeInfoDb_1.34.9         IRanges_2.32.0             
## [25] S4Vectors_0.36.2            BiocGenerics_0.44.0        
## [27] MatrixGenerics_1.10.0       matrixStats_1.2.0          
## [29] reticulate_1.35.0           ggplot2_3.4.4              
## [31] zellkonverter_1.8.0        
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.4                spatstat.explore_3.2-6   
##   [3] tidyselect_1.2.0          RSQLite_2.3.5            
##   [5] htmlwidgets_1.6.4         grid_4.2.3               
##   [7] BiocParallel_1.32.6       Rtsne_0.17               
##   [9] munsell_0.5.0             ScaledMatrix_1.6.0       
##  [11] codetools_0.2-19          ica_1.0-3                
##  [13] future_1.33.1             miniUI_0.1.1.1           
##  [15] withr_3.0.0               spatstat.random_3.2-2    
##  [17] colorspace_2.1-0          progressr_0.14.0         
##  [19] filelock_1.0.3            highr_0.10               
##  [21] knitr_1.45                rstudioapi_0.15.0        
##  [23] ROCR_1.0-11               tensor_1.5               
##  [25] listenv_0.9.1             labeling_0.4.3           
##  [27] GenomeInfoDbData_1.2.9    polyclip_1.10-6          
##  [29] farver_2.1.1              bit64_4.0.5              
##  [31] basilisk_1.15.2004        parallelly_1.37.0        
##  [33] vctrs_0.6.5               generics_0.1.3           
##  [35] xfun_0.42                 BiocFileCache_2.6.1      
##  [37] R6_2.5.1                  ggbeeswarm_0.7.2         
##  [39] rsvd_1.0.5                spatstat.utils_3.0-4     
##  [41] bitops_1.0-7              cachem_1.0.8             
##  [43] DelayedArray_0.24.0       promises_1.2.1           
##  [45] BiocIO_1.8.0              scales_1.3.0             
##  [47] beeswarm_0.4.0            gtable_0.3.4             
##  [49] beachmat_2.14.2           globals_0.16.2           
##  [51] goftest_1.2-3             rlang_1.1.3              
##  [53] splines_4.2.3             rtracklayer_1.58.0       
##  [55] lazyeval_0.2.2            spatstat.geom_3.2-8      
##  [57] abind_1.4-5               reshape2_1.4.4           
##  [59] yaml_2.3.8                httpuv_1.6.14            
##  [61] tools_4.2.3               ellipsis_0.3.2           
##  [63] jquerylib_0.1.4           RColorBrewer_1.1-3       
##  [65] ggridges_0.5.6            plyr_1.8.9               
##  [67] Rcpp_1.0.12               sparseMatrixStats_1.10.0 
##  [69] progress_1.2.3            zlibbioc_1.44.0          
##  [71] purrr_1.0.2               RCurl_1.98-1.14          
##  [73] basilisk.utils_1.15.1     prettyunits_1.2.0        
##  [75] deldir_2.0-2              pbapply_1.7-2            
##  [77] viridis_0.6.5             cowplot_1.1.3            
##  [79] zoo_1.8-12                ggrepel_0.9.5            
##  [81] cluster_2.1.6             magrittr_2.0.3           
##  [83] scattermore_1.2           data.table_1.15.0        
##  [85] RSpectra_0.16-1           lmtest_0.9-40            
##  [87] RANN_2.6.1                ProtGenerics_1.30.0      
##  [89] fitdistrplus_1.1-11       hms_1.1.3                
##  [91] patchwork_1.2.0           mime_0.12                
##  [93] evaluate_0.23             xtable_1.8-4             
##  [95] XML_3.99-0.16.1           fastDummies_1.7.3        
##  [97] gridExtra_2.3             compiler_4.2.3           
##  [99] biomaRt_2.54.1            maps_3.4.2               
## [101] tibble_3.2.1              crayon_1.5.2             
## [103] htmltools_0.5.7           later_1.3.2              
## [105] tidyr_1.3.1               DBI_1.2.2                
## [107] formatR_1.14              dbplyr_2.4.0             
## [109] MASS_7.3-60.0.1           rappdirs_0.3.3           
## [111] cli_3.6.2                 parallel_4.2.3           
## [113] dotCall64_1.1-1           igraph_2.0.2             
## [115] pkgconfig_2.0.3           GenomicAlignments_1.34.1 
## [117] dir.expiry_1.6.0          spatstat.sparse_3.0-3    
## [119] plotly_4.10.4             xml2_1.3.6               
## [121] vipor_0.4.7               bslib_0.6.1              
## [123] XVector_0.38.0            stringr_1.5.1            
## [125] digest_0.6.34             sctransform_0.4.1        
## [127] RcppAnnoy_0.0.22          spatstat.data_3.0-4      
## [129] Biostrings_2.66.0         rmarkdown_2.25           
## [131] leiden_0.4.3.1            uwot_0.1.16              
## [133] DelayedMatrixStats_1.20.0 restfulr_0.0.15          
## [135] curl_5.2.0                shiny_1.8.0              
## [137] Rsamtools_2.14.0          rjson_0.2.21             
## [139] nlme_3.1-164              lifecycle_1.0.4          
## [141] jsonlite_1.8.8            BiocNeighbors_1.16.0     
## [143] fansi_1.0.6               pillar_1.9.0             
## [145] lattice_0.22-5            ggrastr_1.0.2            
## [147] KEGGREST_1.38.0           fastmap_1.1.1            
## [149] httr_1.4.7                survival_3.5-8           
## [151] glue_1.7.0                png_0.1-8                
## [153] bit_4.0.5                 stringi_1.8.3            
## [155] sass_0.4.8                blob_1.2.4               
## [157] RcppHNSW_0.6.0            BiocSingular_1.14.0      
## [159] memoise_2.0.1             irlba_2.3.5.1            
## [161] future.apply_1.11.1